---
title: "Part 2: Spatial & Statistical Analysis"
subtitle: "Understanding Distribution Patterns and Grade Characteristics"
author: "Ghozian Islam Karami"
date: "2025-10-03"
categories: [EDA, Spatial Analysis, Statistics, Visualization]
image: "Part2.png"
format:
html:
code-fold: true
code-tools: true
toc: true
toc-depth: 3
---
## Introduction
With validated data from [Part 1](/posts/Data Validation/), we now explore **where** the data is located and **what** it tells us statistically. This post covers:
- **Pillar 2**: Spatial Distribution Analysis
- **Pillar 3**: Statistical Characterization
These pillars help us understand sampling density, identify spatial patterns, and characterize grade distributions - all critical for informed estimation decisions.
## Setup and Data Loading
```{r}
#| message: false
#| warning: false
library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(DT)
library(RColorBrewer)
library(patchwork)
# Create simulated drilling data
set.seed(123)
n_holes <- 50
collar <- data.frame(
hole_id = paste0("DDH", sprintf("%03d", 1:n_holes)),
x = runif(n_holes, 500000, 501000),
y = runif(n_holes, 9000000, 9001000),
rl = runif(n_holes, 100, 200)
)
# Assay data with spatial correlation
assay_list <- lapply(1:n_holes, function(i) {
n_intervals <- sample(15:25, 1)
depths <- seq(0, by = 2, length.out = n_intervals)
# Add spatial correlation to grades
distance_from_center <- sqrt((collar$x[i] - 500500)^2 + (collar$y[i] - 9000500)^2)
grade_factor <- exp(-distance_from_center / 300)
data.frame(
hole_id = collar$hole_id[i],
from = depths[-length(depths)],
to = depths[-1],
au_ppm = pmax(0, rnorm(n_intervals - 1, mean = 1.5 * grade_factor, sd = 2)),
ag_ppm = pmax(0, rnorm(n_intervals - 1, mean = 15 * grade_factor, sd = 20)),
cu_pct = pmax(0, rnorm(n_intervals - 1, mean = 0.5 * grade_factor, sd = 0.8))
)
})
assay <- do.call(rbind, assay_list)
# Lithology data
litho_codes <- c("Andesite", "Diorite", "Mineralized_Zone", "Altered_Volcanics")
lithology_list <- lapply(collar$hole_id, function(hid) {
n_litho <- sample(4:8, 1)
depths <- sort(c(0, sample(5:40, n_litho - 1), 50))
data.frame(
hole_id = hid,
from = depths[-length(depths)],
to = depths[-1],
lithology = sample(litho_codes, n_litho, replace = TRUE,
prob = c(0.3, 0.2, 0.3, 0.2))
)
})
lithology <- do.call(rbind, lithology_list)
# Clean and prepare data
collar <- collar %>% janitor::clean_names() %>%
select(hole_id, x, y, z = rl) %>%
mutate(hole_id = as.character(hole_id))
assay <- assay %>% janitor::clean_names() %>%
mutate(hole_id = as.character(hole_id))
lithology <- lithology %>% janitor::clean_names() %>%
select(hole_id, from, to, lithology) %>%
mutate(hole_id = as.character(hole_id))
# Merge datasets
collar_std <- collar
assay_std <- assay
lithology_std <- lithology
combined_data <- assay_std %>%
left_join(collar_std, by = "hole_id") %>%
mutate(mid_point = from + (to - from) / 2) %>%
left_join(
lithology_std %>% rename(litho_from = from, litho_to = to),
by = join_by(hole_id, between(mid_point, litho_from, litho_to))
) %>%
select(-mid_point, -litho_from, -litho_to)
```
---
# PILLAR 2: Spatial Distribution Analysis
## Objective
Understand **where** your data is located in 3D space:
- Are drillholes evenly distributed?
- Where are the dense vs sparse areas?
- What are the initial mineralization trends?
- Is there clustering or bias in sampling?
::: {.callout-tip}
## Why Spatial Analysis Matters
Spatial patterns directly impact:
- Kriging neighborhood selection
- Search ellipse orientation
- Estimation variance
- Classification confidence (Measured, Indicated, Inferred)
:::
## 2D Drillhole Location Map
Let's start with a plan view showing collar locations.
### Basic Collar Map
```{r}
#| fig-width: 10
#| fig-height: 8
# Create 2D plan view
p_2d <- ggplot(collar_std, aes(x = x, y = y)) +
geom_point(size = 3, alpha = 0.7, color = "steelblue") +
geom_text(aes(label = hole_id),
hjust = -0.2, vjust = 0.5,
size = 2.5,
check_overlap = TRUE) +
coord_equal() +
labs(
title = "2D Drillhole Plan View",
subtitle = paste(nrow(collar_std), "drillholes"),
x = "Easting (m)",
y = "Northing (m)"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
panel.grid.minor = element_blank()
)
p_2d
```
::: {.callout-note}
## Spatial Pattern Observations
Look for:
- **Clustering**: Groups of closely spaced holes
- **Gaps**: Areas with no drilling
- **Grid pattern**: Regular vs irregular spacing
- **Directional bias**: Preferential drilling directions
:::
### Interactive 2D Map with Average Grades
```{r}
#| fig-width: 10
#| fig-height: 8
# Calculate average grade per hole (example: using au_ppm)
avg_grades <- combined_data %>%
group_by(hole_id, x, y) %>%
summarise(
avg_au = mean(au_ppm, na.rm = TRUE),
max_au = max(au_ppm, na.rm = TRUE),
.groups = 'drop'
) %>%
filter(!is.na(avg_au))
# Create interactive plotly map
plot_ly(data = avg_grades,
x = ~x, y = ~y,
type = 'scatter', mode = 'markers',
marker = list(
size = 10,
color = ~avg_au,
colorscale = 'Viridis',
showscale = TRUE,
colorbar = list(title = "Avg Au<br>(ppm)")
),
text = ~paste0(
"Hole: ", hole_id, "<br>",
"Avg Au: ", round(avg_au, 3), " ppm<br>",
"Max Au: ", round(max_au, 3), " ppm"
),
hoverinfo = 'text') %>%
layout(
title = "2D Drillhole Map: Average Gold Grades",
xaxis = list(title = "Easting (m)"),
yaxis = list(
title = "Northing (m)",
scaleanchor = "x",
scaleratio = 1
)
)
```
### Sampling Density Analysis
```{r}
#| fig-width: 10
#| fig-height: 4
# Calculate local sampling density
density_estimate <- MASS::kde2d(
collar_std$x,
collar_std$y,
n = 50
)
# Convert to data frame for ggplot
density_df <- expand.grid(
x = density_estimate$x,
y = density_estimate$y
) %>%
mutate(density = as.vector(density_estimate$z))
p_density <- ggplot() +
geom_tile(data = density_df, aes(x = x, y = y, fill = density)) +
scale_fill_viridis_c(option = "plasma", name = "Density") +
geom_point(data = collar_std, aes(x = x, y = y),
size = 2, color = "white", alpha = 0.6) +
coord_equal() +
labs(
title = "Sampling Density Heatmap",
subtitle = "Darker areas indicate higher drillhole density",
x = "Easting (m)",
y = "Northing (m)"
) +
theme_minimal()
p_density
```
::: {.callout-important}
## Classification Implications
Sampling density directly impacts resource classification:
- **Dense areas**: Higher confidence → Measured/Indicated
- **Sparse areas**: Lower confidence → Indicated/Inferred
- **No data areas**: Extrapolation risk
:::
## Composite Length Analysis
Understanding sample support is critical for geostatistics.
```{r}
#| fig-width: 10
#| fig-height: 5
# Calculate composite lengths
combined_data <- combined_data %>%
mutate(composite_length = to - from)
# Summary statistics
length_summary <- combined_data %>%
summarise(
`Min Length` = min(composite_length, na.rm = TRUE),
`Q1` = quantile(composite_length, 0.25, na.rm = TRUE),
`Median` = median(composite_length, na.rm = TRUE),
`Mean` = mean(composite_length, na.rm = TRUE),
`Q3` = quantile(composite_length, 0.75, na.rm = TRUE),
`Max Length` = max(composite_length, na.rm = TRUE),
`Std Dev` = sd(composite_length, na.rm = TRUE)
)
datatable(length_summary,
caption = "Table 1: Composite Length Statistics (meters)",
options = list(dom = 't')) %>%
formatRound(columns = 1:7, digits = 2)
# Histogram
p_length <- ggplot(combined_data, aes(x = composite_length)) +
geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) +
geom_vline(xintercept = mean(combined_data$composite_length, na.rm = TRUE),
color = "red", linetype = "dashed", size = 1) +
annotate("text",
x = mean(combined_data$composite_length, na.rm = TRUE),
y = Inf,
label = paste("Mean:", round(mean(combined_data$composite_length, na.rm = TRUE), 2), "m"),
vjust = 2, color = "red", fontface = "bold") +
labs(
title = "Composite Length Distribution",
subtitle = "Check for consistency in sample support",
x = "Composite Length (m)",
y = "Frequency"
) +
theme_minimal()
p_length
```
::: {.callout-warning}
## Variable Sample Support Issues
Inconsistent sample lengths can cause:
- **Support effect**: Grade variance differences
- **Bias in estimation**: Longer samples overweighted
- **Variogram artifacts**: False spatial continuity
**Solution**: Composite to a consistent length (typically 1m or 2m)
:::
## 3D Visualization
Understanding the vertical distribution of data is essential.
```{r}
#| fig-width: 10
#| fig-height: 8
# Prepare 3D data with depth
data_3d <- combined_data %>%
mutate(z_sample = z - ((from + to) / 2)) %>%
filter(!is.na(x) & !is.na(y) & !is.na(z_sample) & !is.na(au_ppm))
# Sample data for performance (use 50% of data)
set.seed(123)
data_3d_sample <- sample_frac(data_3d, 0.5)
# Create 3D scatter plot
plot_ly(
data = data_3d_sample,
x = ~x, y = ~y, z = ~z_sample,
type = 'scatter3d', mode = 'markers',
marker = list(
color = ~au_ppm,
colorscale = 'Viridis',
colorbar = list(title = "Au (ppm)"),
size = 3
),
text = ~paste0(
"Hole: ", hole_id, "<br>",
"Depth: ", round(from, 1), "-", round(to, 1), "m<br>",
"Au: ", round(au_ppm, 3), " ppm"
),
hoverinfo = 'text'
) %>%
layout(
title = "3D Drillhole Assay View",
scene = list(
xaxis = list(title = "Easting"),
yaxis = list(title = "Northing"),
zaxis = list(title = "Elevation (RL)", autorange = "reversed"),
aspectratio = list(x = 1, y = 1, z = 0.5)
)
)
```
::: {.callout-tip}
## 3D Visualization Benefits
3D views help identify:
- Vertical trends in mineralization
- Dip and plunge of ore bodies
- Drilling depth coverage
- Spatial clustering in 3D space
:::
---
# PILLAR 3: Statistical Characterization
## Objective
Understand the **"personality"** of your grade data:
- What does the distribution look like?
- Are there outliers?
- Single or multiple populations?
- What top-cut value is appropriate?
## Univariate Statistics
### Summary Statistics by Grade Variable
```{r}
# Select numeric grade columns
grade_cols <- c("au_ppm", "ag_ppm", "cu_pct")
# Calculate comprehensive statistics
stats_summary <- combined_data %>%
select(all_of(grade_cols)) %>%
pivot_longer(everything(), names_to = "Grade", values_to = "Value") %>%
group_by(Grade) %>%
summarise(
Count = sum(!is.na(Value)),
Mean = mean(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE),
`Std Dev` = sd(Value, na.rm = TRUE),
CV = sd(Value, na.rm = TRUE) / mean(Value, na.rm = TRUE),
Min = min(Value, na.rm = TRUE),
Q1 = quantile(Value, 0.25, na.rm = TRUE),
Q3 = quantile(Value, 0.75, na.rm = TRUE),
Max = max(Value, na.rm = TRUE),
Skewness = e1071::skewness(Value, na.rm = TRUE)
) %>%
mutate(across(where(is.numeric), ~round(.x, 3)))
datatable(stats_summary,
caption = "Table 2: Comprehensive Grade Statistics",
options = list(scrollX = TRUE, pageLength = 10))
```
::: {.callout-note}
## Key Statistical Indicators
- **CV (Coefficient of Variation)**: Measure of variability (CV > 1 indicates high variability)
- **Skewness**: Positive skew common in grade data (long right tail)
- **Mean vs Median**: Large differences suggest outliers
:::
## Grade Distribution Analysis
### Histograms
```{r}
#| fig-width: 10
#| fig-height: 8
# Create histograms for each grade
plot_list <- lapply(grade_cols, function(col) {
ggplot(combined_data, aes(x = .data[[col]])) +
geom_histogram(bins = 50, fill = "skyblue", color = "black", alpha = 0.7) +
geom_vline(xintercept = mean(combined_data[[col]], na.rm = TRUE),
color = "red", linetype = "dashed", size = 1) +
geom_vline(xintercept = median(combined_data[[col]], na.rm = TRUE),
color = "blue", linetype = "dashed", size = 1) +
labs(
title = paste("Distribution of", col),
x = col,
y = "Frequency"
) +
theme_minimal() +
theme(plot.title = element_text(size = 10))
})
# Combine plots
(plot_list[[1]] | plot_list[[2]]) / plot_list[[3]]
```
### Log-Normal Probability Plots
Essential for identifying populations and assessing normality.
```{r}
#| fig-width: 10
#| fig-height: 6
# Create Q-Q plots for log-transformed data
qq_plots <- lapply(grade_cols, function(col) {
data_subset <- combined_data %>%
filter(!is.na(.data[[col]]) & .data[[col]] > 0)
ggplot(data_subset, aes(sample = log(.data[[col]]))) +
stat_qq(color = "steelblue", alpha = 0.5) +
stat_qq_line(color = "red", size = 1) +
labs(
title = paste("Log-Normal Q-Q Plot:", col),
x = "Theoretical Quantiles",
y = "Sample Quantiles (log scale)"
) +
theme_minimal()
})
(qq_plots[[1]] | qq_plots[[2]]) / qq_plots[[3]]
```
::: {.callout-important}
## Population Identification
Breaks or changes in slope on Q-Q plots indicate:
- **Multiple populations**: Different geological/mineralization domains
- **Outliers**: High-grade samples requiring investigation
- **Mixed distributions**: Need for domain separation
**Action**: Use this information for domaining decisions!
:::
## Outlier Detection and Analysis
### Boxplots by Lithology
```{r}
#| fig-width: 10
#| fig-height: 6
# Focus on Au for outlier analysis
p_box <- ggplot(combined_data, aes(x = lithology, y = au_ppm, fill = lithology)) +
geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Gold Grade Distribution by Lithology",
subtitle = "Red points indicate statistical outliers (>1.5 IQR)",
x = "Lithology",
y = "Au (ppm)"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
p_box
```
### Outlier Table
```{r}
# Calculate IQR-based outliers
outlier_threshold <- combined_data %>%
summarise(
Q1 = quantile(au_ppm, 0.25, na.rm = TRUE),
Q3 = quantile(au_ppm, 0.75, na.rm = TRUE)
) %>%
mutate(
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR
)
outliers <- combined_data %>%
filter(au_ppm > outlier_threshold$upper_bound |
au_ppm < outlier_threshold$lower_bound) %>%
select(hole_id, from, to, lithology, au_ppm) %>%
arrange(desc(au_ppm))
datatable(outliers,
caption = paste("Table 3: Outlier Samples (n =", nrow(outliers), ")"),
options = list(pageLength = 10, scrollX = TRUE)) %>%
formatStyle('au_ppm', backgroundColor = '#ffebee', fontWeight = 'bold')
```
::: {.callout-warning}
## Outlier Treatment Options
1. **Keep as-is**: If geologically justified
2. **Top-cut (cap)**: Limit maximum grade
3. **Remove**: If analytical errors suspected
4. **Investigate**: Check for sample preparation issues
**Never** remove outliers without geological justification!
:::
## Top-Cut Analysis
Determining appropriate grade capping values.
```{r}
#| fig-width: 10
#| fig-height: 5
# Calculate percentiles
percentiles <- seq(0.9, 1.0, by = 0.01)
percentile_values <- quantile(combined_data$au_ppm, probs = percentiles, na.rm = TRUE)
percentile_df <- data.frame(
Percentile = percentiles * 100,
Grade = percentile_values
)
# Plot grade vs percentile
p_percentile <- ggplot(percentile_df, aes(x = Percentile, y = Grade)) +
geom_line(color = "steelblue", size = 1.5) +
geom_point(size = 2, color = "darkblue") +
geom_vline(xintercept = c(95, 97.5, 99),
linetype = "dashed",
color = c("orange", "red", "darkred")) +
annotate("text", x = 95, y = max(percentile_values) * 0.9,
label = "P95", color = "orange") +
annotate("text", x = 97.5, y = max(percentile_values) * 0.9,
label = "P97.5", color = "red") +
annotate("text", x = 99, y = max(percentile_values) * 0.9,
label = "P99", color = "darkred") +
labs(
title = "Top-Cut Analysis: Grade vs Percentile",
subtitle = "Common thresholds: P95, P97.5, P99",
x = "Percentile (%)",
y = "Au Grade (ppm)"
) +
theme_minimal()
p_percentile
```
### Impact of Top-Cutting
```{r}
# Compare statistics with different top-cuts
top_cuts <- c(Inf,
quantile(combined_data$au_ppm, 0.95, na.rm = TRUE),
quantile(combined_data$au_ppm, 0.975, na.rm = TRUE),
quantile(combined_data$au_ppm, 0.99, na.rm = TRUE))
impact_df <- data.frame(
Scenario = c("No Top-Cut", "P95 Cut", "P97.5 Cut", "P99 Cut"),
`Cut Value` = round(top_cuts, 2),
Mean = sapply(top_cuts, function(tc) {
mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
}),
Median = sapply(top_cuts, function(tc) {
median(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
}),
`Std Dev` = sapply(top_cuts, function(tc) {
sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
}),
CV = sapply(top_cuts, function(tc) {
sd(pmin(combined_data$au_ppm, tc), na.rm = TRUE) /
mean(pmin(combined_data$au_ppm, tc), na.rm = TRUE)
})
) %>%
mutate(across(where(is.numeric), ~round(.x, 3)))
datatable(impact_df,
caption = "Table 4: Impact of Top-Cutting on Statistics",
options = list(dom = 't'))
```
::: {.callout-tip}
## Selecting Top-Cut Values
Consider:
1. **Statistical analysis**: Probability plots, percentiles
2. **Geological context**: Are high grades real or analytical errors?
3. **Impact on resources**: Balance between grade and tonnage
4. **Industry standards**: P95-P99 common for precious metals
5. **Domaining**: Different top-cuts for different domains
**Document your decision** with geological and statistical justification!
:::
## Grade Distribution by Lithology
Understanding how grades vary by rock type.
```{r}
#| fig-width: 10
#| fig-height: 8
# Faceted histograms by lithology
ggplot(combined_data, aes(x = au_ppm)) +
geom_histogram(bins = 30, fill = "skyblue", color = "black", alpha = 0.7) +
facet_wrap(~ lithology, scales = "free_y", ncol = 2) +
labs(
title = "Gold Grade Distribution by Lithology",
x = "Au (ppm)",
y = "Frequency"
) +
theme_minimal() +
theme(strip.text = element_text(face = "bold", size = 10))
```
### Statistical Comparison by Lithology
```{r}
litho_stats <- combined_data %>%
group_by(lithology) %>%
summarise(
Count = n(),
Mean = mean(au_ppm, na.rm = TRUE),
Median = median(au_ppm, na.rm = TRUE),
`Std Dev` = sd(au_ppm, na.rm = TRUE),
CV = sd(au_ppm, na.rm = TRUE) / mean(au_ppm, na.rm = TRUE),
Max = max(au_ppm, na.rm = TRUE)
) %>%
arrange(desc(Mean)) %>%
mutate(across(where(is.numeric) & !Count, ~round(.x, 3)))
datatable(litho_stats,
caption = "Table 5: Grade Statistics by Lithology",
options = list(pageLength = 10)) %>%
formatStyle(
'Mean',
background = styleColorBar(litho_stats$Mean, 'lightblue'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
)
```
::: {.callout-note}
## Geological Insights
This analysis reveals:
- **Host rocks**: Which lithologies contain mineralization?
- **Barren zones**: Low-grade lithologies to exclude?
- **Grade differences**: Justification for separate domains?
- **Continuity**: Are grades consistent within lithology?
:::
## Summary: Pillars 2 & 3
### Key Achievements
**Spatial Analysis (Pillar 2):**
✓ Visualized drillhole distribution in 2D and 3D
✓ Identified sampling density patterns
✓ Analyzed composite length consistency
✓ Recognized spatial trends in mineralization
**Statistical Analysis (Pillar 3):**
✓ Characterized grade distributions
✓ Identified outliers and proposed top-cuts
✓ Compared statistics by lithology
✓ Assessed data populations
### Checklist: Before Moving to Pillar 4
- [ ] Spatial patterns understood and documented
- [ ] Sampling density adequate for classification
- [ ] Grade populations identified
- [ ] Outliers investigated and treatment decided
- [ ] Top-cut values proposed with justification
- [ ] Lithology-grade relationships documented
::: {.callout-tip}
## Ready for Geological Controls?
With spatial and statistical understanding complete, you're ready to connect these patterns to geology in [Part 3: Geological Controls & Domain Definition](/posts/Geological Domain/).
:::
## Best Practices Recap
### Common Pitfalls to Avoid
1. **Ignoring spatial patterns** - Statistics alone aren't enough
2. **Over-relying on automation** - Always apply geological thinking
3. **Arbitrary top-cutting** - Document geological justification
4. **Skipping lithology analysis** - Rock types control grades
### Documentation Requirements
For JORC compliance, document:
- Spatial distribution maps and density analysis
- Statistical summary by domain/lithology
- Outlier treatment decisions with justification
- Top-cut analysis and selected values
- Population identification methodology
---
*Previous: [← Part 1 - Data Validation](/posts/Data Validation/)*
*Next: [Part 3 - Geological Controls & Domain Definition →](/posts/Geological Domain/)*